8th November, 2019

Prediction versus Intervention

  • In all previous examples, we can achieve good to excellent predictions
  • But we do not merely want to predict systems, but also change them!

  • Teaser:

\[ \begin{aligned} X &:= \epsilon_X \\[.5em] Y &:= X + \epsilon_Y \\[.5em] Z &:= Y + \epsilon_Z \enspace , \end{aligned} \]

  • with \(\epsilon_X, \epsilon_Y \stackrel{\text{iid}}{\sim} \mathcal{N}(0, 1)\) and \(\epsilon_Z \stackrel{\text{iid}}{\sim} \mathcal{N}(0, 0.1)\)

  • Does \(X\) or \(Z\) predict \(Y\) better?

Another Teaser

  • 350 patients chose to take a drug and 350 chose not to
  • Given these data, should a doctor prescribe the drug to a patient?

Outline

  • 1) Motivation
    • Spurious Correlations
    • Modeling for Interventions

  • 2) Causal Inference
    • Seeing versus Doing
    • Structural Causal Models
    • Causal Effects and Confounding

  • 3) Simpson’s Paradox

Causal Inference

The Causal Hierarchy

Seeing

  • Directed Acyclic Graphs (DAGs) visualize (conditional) independencies

Seeing clearly using d-separation

  • d-separation allows us to read off (conditional) independencies from any DAG
  • We need to define a few concepts first:
    • A path \(\omega\) from \(X\) to \(Y\) is a sequence of nodes & edges such that the start & end nodes are \(X\) and \(Y\)
    • A conditioning set \(\mathcal{L}\) is the set of nodes we condition on (it can be empty)
    • Conditioning on a collider or any of its descendants unblocks a path (e.g., \(X \rightarrow Z \leftarrow Y\))

  • Two nodes \(X\) and \(Y\) are d-separated by \(\mathcal{L}\) if and only if members of \(\mathcal{L}\) block all paths between \(X\) and \(Y\)

Doing

  • Number of storks \(X\) are independent of birth rate \(Y\) given environmental development \(Z\)
    • This can be visualized in three different, Markov equivalent DAGs!
    • Doing could mean for example setting \(Z\) to some value \(z\) (hard intervention)

Structural Causal Models

Structural Causal Models

  • Structural Causal Models (SCM) as the fundamental building blocks of causal inference
    • We understand relations between variables in a SCM to be causal
    • Here, we will assume acyclic, linear SCMs with independent Gaussian error terms
    • The relations in an SCM can be visualized in a directed acyclic graph

  • Example:

\[ \begin{aligned} X &:= \epsilon_X \\[.5em] Y &:= X + \epsilon_Y \\[.5em] Z &:= Y + \epsilon_Z \enspace , \end{aligned} \]

  • with \(\epsilon_X, \epsilon_Y \stackrel{\text{iid}}{\sim} \mathcal{N}(0, 1)\) and \(\epsilon_Z \stackrel{\text{iid}}{\sim} \mathcal{N}(0, 0.1)\)

  • Note the factorization of the joint distribution:

\[ p(X_1, \ldots, X_n) = \prod_{i=1}^n p(X_i \mid \text{Parents}(X_i)) \hspace{6em} p(X, Y, Z) = p(Z \mid Y) \, p(Y \mid X) \, p(X) \]

Structural Causal Models

Structural Causal Models

set.seed(1)

n <- 100
x <- rnorm(n, 0, 1)
y <- x + rnorm(n, 0, 1)
z <- y + rnorm(n, 0, 0.1)

Causal Effect and Confounding

  • Interventional distribution \(p(Y \mid do(Z))\) is observational distribution \(p_m(Y \mid Z)\) in manipulated DAG
  • For linear, acyclic SCMs we define the average causal effect:

\[ ACE(Z \rightarrow Y) = \mathbb{E}\left[Y \mid do(Z = z + 1) \right] - \mathbb{E}\left[Y \mid do(Z = z) \right] \enspace . \]

  • Confounding: The causal effect of \(Z\) on \(Y\) is confounded if \(p(Y \mid Z) \neq p(Y \mid do(Z))\).
set.seed(1)

get_ce <- function(zvalue) {
  n <- 100
  x <- rnorm(n, 0, 1)
  y <- x + rnorm(n, 0, 1)
  z <- zvalue
  
  mean(y) # E[Y | do(Z = zvalue)]
}

(ACE <- get_ce(1) - get_ce(0))
## [1] -0.0101961

Causal Effect and Confounding

Causal Effect and Confounding

  • From the SCM, we see that the \(ACE(X \rightarrow Y) = 1\)
  • This corresponds to what we find by intervening on \(X\):
set.seed(1)

get_ce <- function(xvalue) {
  n <- 100
  x <- xvalue
  y <- x + rnorm(n, 0, 1)
  z <- rnorm(n, 0, 0.1)
  
  mean(y) # E[Y | do(X = xvalue)]
}

(ACE <- get_ce(1) - get_ce(0))
## [1] 1.079214

Causal Effect and Confounding

Valid Adjustment Sets

  • In observational data, there will always be confounding factors
    • What variables should we adjust for?
    • This requires knowledge about the underlying DAG
    • (Don’t just adjust for all variables — that can induce bias!)

  • Backdoor Criterion (Pearl, Glymour, & Jewell, 2016, p. 61): An adjustment set \(\mathcal{Z}\) fulfills the backdoor criterion if no member in \(\mathcal{Z}\) is a descendant of \(X\) and members in \(\mathcal{Z}\) block all paths between \(X\) and \(Y\). Adjusting for \(\mathcal{Z}\) thus yields the causal effect of \(X \rightarrow Y\).

  • Rationale:
    • We block all spurious paths between \(X\) and \(Y\)
    • We leave all directed paths from \(X\) to \(Y\) unperturbed
    • We create no new spurious paths

Recap

  • There is a causal hierarchy:
    • Seeing: Directed Acyclic Graphs (DAGs) encode conditional independencies
    • Doing: causal DAGs encode interventional distributions (do-operator)

  • In observational settings, confounding variables abound
  • Causal DAGs allow us to derive valid adjustment sets

  • Causal inference goes beyond prediction by modeling the outcome of interventions
  • Structural Causal Models (SCMs) are the building block of (this flavour) causal inference
    • They encode observational as well as interventional distributions

Simpson’s Paradox

Recovery Rate and Gender

  • 350 patients chose to take a drug and 350 chose not to
  • Given these data, is the drug helpful or harmful?
  • Should a doctor prescribe the drug to a patient?

Recovery Rate and Blood Pressure

  • 350 patients chose to take a drug and 350 chose not to
  • Given these data, is the drug helpful or harmful?
  • Should a doctor prescribe the drug to a patient?

Simpson’s Paradox

  • The data are exactly the same in both cases
  • Statistics alone cannot provide an answer
  • We need to have an understanding of the causal mechanism behind these data
  • We can visualize this causal mechanism using directed acyclic graphs (DAGs)

Recovery Rate and Gender

  • Suppose we know that estrogen has a negative effect on recovery
  • Note also that more women choose the drug than men
  • Therefore, being a woman has an effect on drug taking as well as recovery
    • Should condition on gender!
    • This blocks the backdoor path \(D \leftarrow G \rightarrow R\)
    • And therefore unconfounds the effect \(D \rightarrow R\)

Recovery Rate and Blood Pressure

  • Blood pressure is measured after taking the drug
    • It cannot cause choosing the drug
    • Instead, it is a mechanism of how the drug works
    • Should not condition on blood pressure!

Discussion

  • The previous examples used observational data
  • Randomized control trials (RCTs) allow causal statements because they remove confounding
  • However, many real life examples do not allow for RCTs
    • In theory, we can make causal statements even from observational data
    • Causal graphs can help us navigate this mess
    • In practice, difficult to arrive at the “correct” causal graph

Recap

  • There is a causal hierarchy:
    • Seeing: Directed Acyclic Graphs (DAGs) encode conditional independencies
    • Doing: causal DAGs encode interventional distributions (do-operator)

  • In observational settings, confounding variables abound
  • Causal DAGs allow us to derive valid adjustment sets

  • Causal inference goes beyond prediction by modeling the outcome of interventions
  • Structural Causal Models (SCMs) are the building block of (this flavour) causal inference
    • They encode observational as well as interventional distributions

Advertisement

Exercises

Exercise I

  • Come up with two different examples for each of these graphs

Exercise II

  • List all marginal as well as conditional independencies shown in this graph

Exercise III

  • Draw the DAG induced by this Structural Causal Model:
set.seed(1)

n <- 1000
x <- rnorm(n, 0, 1)
y <- rnorm(n, 0, 1)
z <- x + y + rnorm(n, 0, 1)
  • Can you explain what happens here?
coef(lm(y ~ 0 + x))
##           x 
## 0.006608738
coef(lm(y ~ 0 + x + z))
##          x          z 
## -0.5262511  0.5047807

Exercise IV

  • Draw the DAG induced by this SCM:

\[ \begin{aligned} S &:= \epsilon_S \\[.5em] T &:= S + \epsilon_T \\[.5em] U &:= W + \epsilon_U \\[.5em] V &:= 0.90 Z + \epsilon_V \\[.5em] W &:= 0.50 S + 1 X + 3 Y + \epsilon_W \\[.5em] X &:= Z + \epsilon_X \\[.5em] Y &:= 1.50V + X + \epsilon_Y \\[.5em] Z &:= \epsilon_Z \end{aligned} \]

  • where

\[ (\epsilon_S, \epsilon_T, \epsilon_U, \epsilon_V, \epsilon_W, \epsilon_X, \epsilon_Y, \epsilon_Z) \stackrel{iid}{\sim} \mathcal{N}(0, 1) \]

Exercise V

  • Using the previously drawn DAG, mark the following statement as either correct or incorrect:
    • Regressing \(T\) onto \(Y\) yields no effect (Note: \(T\) onto \(Y\) means ‘lm(t ~ y)’)
    • Regressing \(T\) onto \(Y\) while adjusting for \(W\) yields no effect
    • \(W\) and \(T\) are marginally independent
    • \(Y\) is a collider on the path from \(X\) to \(U\) that goes through \(Y\)
    • \(Y\) is independent of \(Z\) given given \(X\) and \(V\)
    • \(S\) is independent of \(Y\) given \(U\)
    • \(T\) and \(U\) are marginally independent; \(T\) and \(Y\) are marginally independent
    • \(X\) and \(S\) are marginally independent
    • \(X\) is independent of \(U\) given \(W\); \(X\) is independent of \(V\) given \(Z\)
    • \(X\) is independent of \(V\) given \(Z\) and \(U\)
  • Generate \(n = 2000\) observations from the previous DAG
    • Check your above answers by running the relevant regressions / conditional independence tests
    • You can use the ‘ci.test’ function from the ‘bnlearn’ package

Exercise VI

  • Use the previous DAG for these questions
  • Compute \(\mathbb{E}\left[W \mid X = x\right]\)
    • Does it correspond to the causal quantity in the SCM? Why / why not?

  • Compute \(\mathbb{E}\left[W \mid do(X = x)\right]\)

  • What is the valid adjustment set to compute the causal effect \(\mathbb{E}\left[U \mid do(Z)\right]\)?

  • Visualize the marginal distribution of \(W\) under the interventional DAG \(Y:= 2\)
    • Compare it with the marginal distribution under the observational DAG
    • What do you observe? Why?

  • Implement the intervention \(Y:= y\) for \(y \in [0, 4]\)
    • What do you expect \(\mathbb{E}\left[W \mid do(Y = y)\right]\) will look like as a function of \(y\)? Why?
    • What do you expect \(\text{Var}\left[W \mid do(Y = y)\right]\) will look like as a function of \(y\)? Why?
    • Visualize both!

Exercise VII

  • Download the (made-up) observational data set about mental health, exercise, and age from here.

  • Visualize and analyze the data. What do you observe?
  • Draw a DAG that could underlie these data. Which analysis is the correct one?